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Abstract 

Background: Pluripotency is a fundamental property of early mammalian development but it is currently unclear 
to what extent its cellular mechanisms are conserved in vertebrates or metazoans. P0U5F1 and P0U2 are the two 
principle members constituting the class V POU domain family of transcription factors, thought to have a conserved 
role in the regulation of pluripotency in vertebrates as well as germ cell maintenance and neural patterning. They 
have undergone a complex pattern of evolution which is poorly understood and controversial. 

Results: By analyzing the sequences of P0U5F1, P0U2 and their flanking genes, we provide strong indirect 
evidence that P0U5F1 originated at least as early as a common ancestor of gnathostomes but became extinct in a 
common ancestor of teleost fishes, while both P0U5F1 and P0U2 survived in the sarcopterygian lineage leading to 
tetrapods. Less divergent forms of P0U5F1 and P0U2 appear to have persisted among cartilaginous fishes. 

Conclusions: Our study resolves the controversial evolutionary relationship between teleost pou2 and tetrapod 
P0U2 and POUSFl, and shows that class V POU transcription factors have existed at least since the common 
ancestor of gnathostome vertebrates. It provides a framework for elucidating the basis for the lineage-specific 
extinctions of P0U2 and P0U5F1. 



Background 

Loss of potency during differentiation is fundamental to 
the development of complex metazoans. Pluripotent em- 
bryonic cells are able to give rise ultimately to all tissues of 
the adult body. In at least some mammals, pluripotency 
can be "captured" in vitro in the form of indefinitely self- 
renewing embryonic stem (ES) cells. Thus ES cells can 
serve as a model for the differentiation of their in vivo 
counterparts into ectoderm, mesoderm and endoderm 
derivatives. 

POU5F1 (also called OCT4 or OCT3/4) is a central 
regulator of pluripotency in mammals. In the mouse, 
deletion of PouSfl causes loss of pluripotency in the inner 
cell mass and differentiation to trophoblast, revealing its 
earliest developmental role [1]. POU5F1 is also a potent 
reprogramming factor capable of facilitating the derivation 
of induced pluripotent stem (iPS) cells [2,3]. Conditional 
knockout of PouSfl in mouse primordial germ cells results 
in their apoptosis [4], showing that the role of POU5F1 is 
not exclusively restricted to preventing differentiation. 

POU2 is a vertebrate paralog of POU5F1 that has been 
best characterized in zebrafish. Curiously, some vertebrate 
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lineages, such as salamanders, marsupials and mono- 
tremes, have preserved both P0U2 and POUSFl in their 
genomes while in other vertebrates one or the other gene 
has become extinct [5-7]. Thus squamate reptiles and 
eutherian mammals have only POUSFl while birds and 
frogs have only P0U2 (called POUV in birds). In Xenopus, 
P0U2 is present as three tandem copies - 0CT2S, OCT60 
and OCT90, 

For reasons that are not fully clear, teleost pou2 was re- 
cently renamed pouSfl despite multiple pieces of evidence 
for a closer affinity to P0U2 orthologs of tetrapods. 
Onichtchouk [8] argued that since orthologous genes are 
defined "as originating from a single ancestral gene in the 
last common ancestor of the compared genomes", teleost 
pou2 is orthologous to mammalian POUSFl, However, by 
the same argument, teleost pou2 is also orthologous to 
tetrapod P0U2 orthologs, thus obviating the need for a 
name change. Teleost pou2 shares more sequence similar- 
ity as well as conserved synteny with tetrapod P0U2 [5,6], 
but perhaps more importantly, it was not proven whether 
the duplication event giving rise to each paralog occurred 
after or before the common ancestor of tetrapods and 
teleost fishes. If the latter, POUSFl must have become ex- 
tinct in teleosts as it has in some other tetrapod lineages 
such as birds and frogs. 
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P0U5F1 and P0U2 share a five-exon genomic structure 
that is characteristic of the class V POU family. Exons 1 and 
5 encode the poorly conserved N- and C-terminal transacti- 
vation domains, respectively, while Exons 2 to 4 encode the 
highly conserved POU-domain, which comprises the POU- 
specific domain and the POU-homeodomain separated by a 
short linker region [9-11]. 

Results 

Newly Identified P0U2 and P0U5F1 orthologs in 
vertebrates 

To gain insight into the origins of the class V POU family 
of transcription factors in vertebrates, BLAST searches 
were performed for sequences homologous to mammalian 
P0LI2 and P0U5F1, Previously unreported orthologs of 
POUSFl were identified from a large number of verte- 
brate species, including the painted turtle {Chrysemys 
picta bellii)y Indian python {Python molurus) and coela- 
canth {Latimeria chalumnae), P0U2 orthologs were also 
identified in many species, including the alligator {Alligator 
mississippiensis), painted turtle, coelacanth and spotted gar 
{Lepisosteus oculatus). 

The avian P0U2 ortholog - POUV - was identified in 
genome assemblies of the turkey {Meleagris gallopavo), 
medium ground finch {Geospiza fortis) and budgerigar 
{Melopsittacus undulatus), adding to the previously iden- 
tified orthologs from chicken [12] and zebra finch [5]. 
Conserved open reading frames orthologous to chicken 
Exon 1 could not be identified in other avian species. As 
chicken Exon 1 was previously identified as unlikely to be 
homologous to Exon 1 from non-avian orthologs [5], 
all available avian genomes were re-examined. Low strin- 
gency BLAST searches identified a single sequence 
(Ti 224571611) from the chicken whole genome shotgun 
(WGS) trace archives with homology to the proximal pro- 
moter and 5 ' part of Exon 1 of non-avian P0U2 orthologs 
(see below). In addition, a primordial germ cell-derived 
partial chicken EST (GenBank accession DR410403) in- 
cluded sequence with clear homology to the 3' part of 
Exon 1 from non-avian P0U2 orthologs. The apparent 
absence of both the proximal P0U2 promoter and the 
"canonical" Exon 1 in other birds is probably due to gaps 
in their respective genome assemblies, suggesting that 
features of this region impart recalcitrance to sequencing. 
We conclude that the previously published cDNA for 
chicken POUV represents a rare or non-canonical chicken- 
specific transcript (retaining the first intron) that was select- 
ively isolated due to the PCR-based methods used. 

Alignment of a broad selection of P0U2 and POUSFl 
translated sequences (Additional file 1) showed almost 
no conservation within the N-terminal region between 
paralogs. However, a short motif with the consensus 
sequence (K/R)XWYXF was moderately well conserved 
in both POU2 and POUSFl (Figure 1), the first time a 



sequence signature conserved in the N-terminal domain 
of all family members has been identified. The previously 
noted N-terminal sequence MAGH and the deletion of a 
single arginine residue within the POU-homeodomain [5], 
as well as an aspartic acid instead of glutamic acid at a site 
within the POU-specific domain identified by Ye and 
colleagues [13], were among the few fully conserved signa- 
tures specifically characterizing POUSFl orthologs. We 
also noted that a second single-residue deletion in the 
linker region separating the POU-specific and POU- 
homeodomains of POUSFl is specific to Boreoeutheria, 
as it was not present in the elephant (Afrotheria), arma- 
dillo (Xenarthra) or any other vertebrate (Figure 1), 
suggesting a modification in function of POUSFl among 
some eutherians. 

A multigenic duplication gave rise to P0U2 and POUSFl 

To gain further insight into the evolution of P0U2 and 
POUSFl and to confirm orthology where possible, we 
examined their synteny with other genes in available ver- 
tebrate genome assemblies (summarized in Figure 2). As 
previously reported [S,6], P0U2 is flanked by orthologs 
of NPDCl and FUT7 in all vertebrate genomes for 
which synteny could be determined. In the coelacanth 
and turtle genomes, POUSFl is flanked by a previously 
unreported paralog of NPDCl - a 9-exon gene expressed 
in differentiating neuronal cells [14, IS] - which we call 
NPDCIL {NPDCl-like), indicating that the original du- 
plication event that generated POUSFl and P0U2 was 
of a multigenic region. A search for NPDCI/NPDCIL 
homologs in other vertebrates identified sequences in 
several squamate reptiles, including python, anole lizard 
{Anolis carolinensis) and Schlegels Japanese gecko {Gekko 
japonicus), Phylogenetic analysis showed that these se- 
quences represent NPDCIL and not NPDCl (Figure 3 
and Additional file 2). Thus extinction of P0U2 in squa- 
mate reptiles [S] was apparently associated with deletion 
of a larger, multigenic region that also included NPDCl, 
An additional rearrangement in the mammalian lineage 
resulted in DDX39B lying upstream of POUSFl and the 
possibly simultaneous extinction of NPDCIL. In the com- 
mon ancestor of therian mammals (after their split with 
monotremes), the H2 major histocompatibility complex 
was inserted between DDX39B and POUSFl (Figure 2). 

The duplication that gave rise to POUSFl occurred in a 
common ancestor of gnathostomes 

The apparent absence of POUSFl in all non-sarcopterygian 
(for example, teleost fish) genomes at first glance suggested 
that the origin of POUSFl by duplication of P0U2 is specific 
to the sarcopterygian lineage, or at least cannot be proven 
otherwise. However, the demonstration (above) that the du- 
plication included at least one flanking gene, NPDCl/ 
NPDCIL, provided an alternative strategy for determining 
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Figure 1. Sequence signatures among class V POD family proteins. Within tine N-terminal domain, tine only motif present in all family 
members is boxed (A). Within the POU domain, position (B) is a glutamic acid residue in all P0U2 orthologs and an aspartic acid residue in all 
P0U5F1 orthologs. Single-residue deletions at (C) and (D) are specific to boroeutherian P0U5F1 and all P0U5F1 orthologs, respectively. 



its timing. We, therefore, searched for homologs of NPDCl 
and NPDCIL in cartilaginous fishes (class Chondrichthyes), 
focusing on Exons 5 to 9 as these are the best conserved. 
The identified sequences are summarized in Figure 4. 
Three whole genome shotgun (WGS) contigs were identi- 
fied from the elephantfish {Callorhinchus callorynchus, 
subclass Holocephali), which included Exons 5 to 6, 7 to 8 
and 9, respectively. These presumably form part of a com- 
mon gene but this was not assumed for the purpose of 
this analysis. Six WGS contigs were identified from the 
little skate {Leucoraja erinacea, subclass Elasmobranchii), 
each containing a single exon. These included two homo- 
logs of Exon 5, one of Exon 6, two of Exon 8 and one of 
Exon 9. Thus the little skate genome appears to contain at 
least two homologs of NPDCI/NPDCIL, Lastly, multiple 
overlapping expressed sequence tags (ESTs) from the spiny 
dogfish {Squalus acanthias, subclass Elasmobranchii) were 
identified, which together spanned almost the full coding 
region. These were combined in silico to produce a single 
sequence for analysis. 

To maximize statistical power, we first compared the 
translated dogfish sequence (the only chondrichthyan se- 
quence spanning Exons 5 to 8) with NPDCl and NPDCIL 
orthologs of other species, including a tunicate {Ciona 
savignyi) NPDCI/NPDCIL homolog as an outgroup. The 



dogfish sequence clustered with NPDCIL orthologs with 
a significant bootstrap value using three different methods 
for generating consensus phylogenetic trees (maximum 
parsimony, maximum likelihood and neighbor-joining) 
(Figure 3A). In a comparison with the sequences from 
coelacanth, a species with both NPDCIL and NPDCl (to 
control for lineage-specific differences in divergence rate), 
the dogfish sequence was clearly more similar to NPDCIL 
than to NPDCl, indicating that its clustering with 
NPDCIL in the consensus trees was not simply due to 
more rapid divergence from an ancestral NPDCl -like 
sequence (Figure 3E). This indicated that a gene more 
similar to NPDCIL than to NPDCl has existed since at 
least as early as the common ancestor of Chondrichthyes 
and Osteichthyes, and that duplication of an NPDCl I 
NPDCIL ancestral gene must have occurred before the 
split between Sarcopterygii and Actinopterygii, since both 
groups have NPDCl orthologs that are more similar to 
each other than to NPDCIL, To examine whether the 
duplication occurred even earlier in a common ancestor 
of Chondrichthyes and Osteichthyes, we performed phylo- 
genetic analyses of the other chondrichthyan sequences 
from elephantfish and little skate (Figure 3B-D). Both of 
the elephantfish sequences (spanning Exons 5 to 6 and 7 
to 8, respectively) clustered with NPCDl orthologs and 
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Figure 2. Synteny at the P0U5F1 and POC/2 loci in sarcopterygians. The synteny of P0U5F1 and P0U2 with other genes in extant species is 
shown on the right-hand side. Proposed ancestral genomic rearrangements that explain the current synteny are shown on the left-hand side. 
(A) A multigenic duplication in a sarcopterygian ancestor gave rise to P0U5F1 and NPDCIL, causing P0U5F1 to flank TCF19 and P0U2 to flank 
FUT7. (B) In a common ancestor of mammals, deletion of NPDCIL caused P0U5F1 to flank DDX39B. The orientation of FUT7 also was inverted. 
(C) In a common ancestor of therian mammals, the H2 major histocompatibility complex became inserted between P0U5F1 and DDX39B. 



were separate from the dogfish sequence and NPDCIL 
orthologs, regardless of the exon analyzed or the method 
used. For the little skate, one of the two Exon 5 sequences 
and one of the two Exon 8 sequences clustered with the 
dogfish sequence regardless of the analysis method and 
with significant bootstrap values for three of the six ana- 
lyses (one for Exon 5 and two for Exon 8), indicating that 
these sequences are orthologous to the dogfish sequence. 
The other little skate Exon 5 and Exon 8 sequences, plus 
the Exon 6 sequence, each clustered with an elephantfish 
sequence to the exclusion of all other sequences in almost 
every case (8/9), with only one (non-significant) exception 
(Exon 5 - maximum parsimony; Figure 3B). Bootstrap 
values for this clustering were significant in three of the 



other eight analyses (Exon 6 - maximum parsimony; Exon 
8 - maximum likelihood and neighbor-joining). These re- 
sults strongly suggested that chondrichthyans collectively 
have both NPDCl and NPDCIL paralogs and that both 
are present in the little skate genome. To exclude the pos- 
sibility that the putative NPDCl ortholog (in elephantfish 
and little skate) is a chondrichthyan-specific paralog of the 
dogfish sequence, we compared the two elephantfish se- 
quences (Exons 5 to 6 and 7 to 8) to coelacanth NPDCl 
and NPDCIL (Figure 3E). Both elephantfish sequences 
were more similar to coelacanth NPDCl than to either 
the dogfish sequence or coelacanth NPDCIL, strongly 
arguing against a scenario in which the elephantfish se- 
quences are derived from a chondrichthyan-specific 
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Figure 3. (See legend on next page.) 
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(See figure on previous page.) 

Figure 3. Phylogenetic analysis of NPDC1/NPDC1L homologs in gnathostomes. (A-D) Plnenograms of pliylogenetic analyses alignments of 
translated sequences of NPDCl, NPDCIL and chondrichthyan sequences analyzed by maximum parsimony, maximum likelihood and neighbor- 
joining methods, using a tunicate NPDCI/NPDCIL homolog as an outgroup. Alignments are presented in Additional file 2. Only significant 
bootstrap values (>70%) are shown and those relevant to the text are circled in red. Chondrichthyan sequences are boxed. NPDCIL sequences, 
including putative chondrichthyan orthologs, are bracketed. (E) Percentage of similarity/identity values comparing translated elephantfish and 
dogfish sequences to each other and to coelacanth NPDCl and NPDCl L. The percentages highlighted in bold show that the elephantfish 
sequences are most similar to coelacanth NPDCl while the dogfish sequence is most similar to coelacanth NPDCl L, indicating respective 
orthology when combined with the phenogram data. 



duplication of an ancestral NPDCl /NPDCIL precursor 
that was more similar to extant NPDCIL orthologs 
than to NPDCl orthologs. It may thus be concluded that 
orthologs of both NPDCl and NPDCIL are present 
among cartilaginous fishes and, therefore, that the dupli- 
cation event giving rise to P0U2 and P0U5F1 must have 
occurred at least as early as a common ancestor of extant 
gnathostomes. 

Putative P0U2 and P0U5F1 orthologs are present in 
chondrichthyans 

Since the duplication that gave rise to NPDCl and 
NPDCIL can be reasonably assumed to have occurred in 
a common ancestor of cartilaginous fishes and other 
jawed vertebrates, we searched chondrichthyan databases 
thoroughly for homologs of P0U2 and P0U5F1. The 
identified sequences are summarized in Figure 4. In the 
elephantfish, we identified a single WGS contig encoding 
Exons 2 and 3 and a separate contig encoding Exon 5. In 
the little skate, we identified a partial sequence for Exon 1 
and two homologs of each of Exons 3, 4 and 5, all on sep- 
arate contigs. Thus, while it was unclear which of the 
identified exons collectively form part of a common gene, 
at least two genes encoding class V POU domain tran- 
scription factors exist in the little skate genome. Although 
synteny with other genes could not be determined from 
either of the chondrichthyan genome assemblies, various 
sequence signatures mostly resembled POU2 rather than 
POU5F1, including a lack of the single arginine deletion 
within the POU-specific domain of POU5F1 (Figure 1) 
[5]. This could be explained by a lineage-specific duplica- 
tion of P0U2 in the little skate, similar to the tandem 
P0U2 triplication found in Xenopus, However, the pres- 
ence of a single homolog of both NPDCl/NPDClL and 
P0U2/P0USF1 in the elephantfish but two homologs of 
each in the little skate suggested the presence of both 
P0U2 and POUSFl orthologs in the latter species. To test 
this, we performed phylogenetic analysis of translated se- 
quences for each exon (Figure 5 and Additional file 3). For 
the elephantfish sequences, Exons 2, 3 and 5 generally 
clustered with POU2 orthologs and this was highly signifi- 
cant for one analysis of Exon 5 (maximum likelihood; 
bootstrap value 90%). This suggested that the elephantfish 
contains a single ortholog of P0U2, The elephantfish 



sequences always clustered with one little skate sequence 
to the exclusion of all others (significant bootstrap value 
for all three analyses of each exon), indicating orthology. 
The remaining little skate sequences (Exonl, Exon 3, Exon 
4 (x2) and Exon 5) generally clustered non-significantly 
with either POU2 or POUSFl orthologs, with two not- 
able exceptions. The little skate Exon 1 sequence clus- 
tered with POUSFl orthologs for all three methods of 
analysis. For one analysis this was highly significant 
(maximum parsimony - bootstrap 94%). One of the little 
skate Exon S sequences clustered non-significantly with 
POU2 orthologs for one analysis method (maximum 
parsimony) but clustered highly significantly with 
POUSFl orthologs for the other two methods (max- 
imum likelihood - 90%; neighbor-joining - 92%). 

Combined, the above data suggest that orthologs of 
both P0U2 and POUSFl exist among cartilaginous fishes. 
Although the identity of every WGS contig cannot be 
assigned with confidence, evidence suggests that the little 
skate has both a P0U2 and a POUSFl ortholog, while the 
elephantfish has only a P0U2 ortholog. This is consistent 
with the presence of both NPDCl and NPDCIL orthologs 
in the little skate, but only an NPDCIL ortholog in the 
elephantfish. 

Discussion 

Our data show that the duplication that gave rise to 
POUSFl and P0U2 occurred in a common gnathostomal 
ancestor. This can be deduced by combining two crucial 
pieces of evidence. First, conserved synteny shows that the 
duplication was multigenic and also gave rise to the 
paralogs NPDCl and NPDCIL, Second, orthologs of both 
NPDCl and NPDCIL were identified in cartilaginous 
fishes. Consistent with this deduction, we identified se- 
quences in cartilaginous fishes that appear to correspond 
to either P0U2 or POUSFl. Orthologs of both P0U2 and 
POUSFl are likely to still be present in the genome of the 
little skate, although their sequences appear less divergent 
from each other than they are in higher vertebrates. We 
also predict that an ortholog of POUSFl is present in the 
spiny dogfish, since this species also retains an ortholog of 
NPDCIL, but POUSFl is presumably extinct in the 
elephantfish lineage. 
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Figure 4. Summary of genomic and expressed sequences identified in ciiondrichthyan genomes. (A) Sequences homologous to Exons 5 
to 9 of NPCDl and NPDOL (B) Sequences homologous to Exons 1 to 5 of P0U2 and P0U5F1. 
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Figure 5. (See legend on next page.) 
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(See figure on previous page.) 

Figure 5. Phylogenetic analysis of POU2/POU5F1 homologs in gnathostomes. Translated sequences of individual exons were analyzed by 
maximum parsimony, maximum likelihood and neighbor-joining methods and displayed as unrooted consensus trees. Alignments for the 
analyses are presented in Additional file 3. Only significant bootstrap values (>70%) are shown and those relevant to the text are circled in red. 
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A proposed model for the evolution of the P0U2I 
P0U5F1 family in vertebrates, based on the present data, 
is summarized in Figure 6. Turtles, coelacanths and prob- 
ably at least some elasmobranch fishes all have orthologs 
of both P0U2 and P0U5F1, joining with marsupials, 
monotremes and salamanders [5-7] as the only known 
lineages that have retained both genes. Extinction of 
POUSFl in birds and crocodilians may have been a single 
event dating to a common archosaurian ancestor. The 
absence of P0U2 in both anole and python genomes also 



suggests a single extinction event in a common ancestor 
of squamate reptiles. 

Contrary to a recent assertion [8], our study provides 
clear evidence that the pou2 gene of teleost and other 
actinopterygian fishes is a bona fide ortholog of tetrapod 
P0U2 and not of POUSFl, Its recent renaming to 
pouSfl (RefSeq-ID NM_131 112.1) by the zebrafish no- 
menclature committee is, therefore, misleading. POUSFl 
became extinct possibly in a common ancestor of ac- 
tinopterygians, or at least of teleost fishes. This finding 
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Figure 6. Model for the evolution of POUSFl and P0U2 in vertebrates. P0U2 and POUSFl arose by duplication of an ancestral gene in a 
common ancestor of Osteichthyes and Chondrichthyes. One or other gene then became extinct (indicated by dashed lines) in some lineages. 
The dogfish and little skate are representatives of Elasmobranchii. 
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is important because misleading nomenclature can po- 
tentially lead to misleading assumptions regarding evolu- 
tionary conservation versus divergence of the respective 
roles of POU2 and POU5F1. 

Orthologs of P0U2 and P0U5F1 from various ver- 
tebrates have been tested for their ability to maintain 
pluripotency in mouse ES cells or to generate mouse or 
human iPSCs. Non-eutherian POUSFl orthologs from 
axolotl [7] and platypus [6] both have this ability. P0U2 
orthologs from opossum, chicken, Xenopus, axolotl and 
medaka are also able to maintain or induce pluripotency 
[6,7,12,16,17], even in species that have retained both 
paralogs (axolotl and opossum). Surprisingly, although 
medaka pou2 can maintain mESC pluripotency, pou2 of 
another teleost fish, zebrafish, cannot [7]. Neverthess, 
the conservation in function of class V POU family 
members despite very poor sequence conservation in the 
transactivation domains can perhaps be expected consi- 
dering that deletion of either (but not both) of the N- or 
C-terminal domains did not affect the ability of mouse 
POUSFl to maintain ES cell pluripotency [18]. Main- 
taining pluripotency in ES cells probably serves as a 
model for only a limited proportion of the roles POU2 
and POUSFl serve in vivo. Thus, although there is 
strong evidence for an ancient role for the common an- 
cestor of P0U2 and POUSFl at least in the maintenance 
of pluripotency, deducing distinct functions and roles 
between various P0U2 and POUSFl orthologs will pro- 
bably require in vivo assays other than ES cell comple- 
mentation. This would include deducing the function of 
the conserved (K/R)XWYXF motif in the N-terminal 
domain. 

A general, although not universal pattern, appears to 
be that P0U2 orthologs are more widely expressed in 
non-germline and non-pluripotent tissues than are 
POUSFl orthologs. In marsupials, P0U2 transcripts are 
detectable by RT-PCR in a wide range of adult tissues 
whereas POUSFl expression is restricted to the germ 
line and early conceptuses [S,19]. Nevertheless, P0U2 is 
also differentially expressed in early tammar conceptuses 
[S] and protein immunolocalization suggests that POU2 
is a more specific marker than POUSFl of very early 
epiblast [20]. Interestingly in the sturgeon, pou2 tran- 
scripts were also detected in many adult tissues [13]. 

P0U2 orthologs seem to have a more important role 
than POUSFl in early neural development. In the 
axolotl, P0U2 but not POUSFl is expressed specifically 
in the early neural plate and later in the developing 
hindbrain [7], in a pattern similar to chicken POUV [12], 
Xenopus 0CT2S and 0CT91 [17] and zebrafish pou2 
[21-2S], but not medaka pou2 [26]. 

The pattern of germ cell expression is also inconsistent 
among P0U2 and POUSFl orthologs. Marsupial POUSFl 
but not P0U2 expression was detected by in situ 



hybridization in primordial germ-cells and early spermato- 
gonia [S,19], whereas both axolotl paralogs are expressed 
in germ cells [7]. Germ cell expression has also been 
reported for chicken POUV {P0U2) and Xenopus OCT60 
[27] and among teleosts for medaka [26] and cod [28] but 
not for zebrafish. Nevertheless, all POUSFl orthologs that 
have been examined are expressed in germ cells, which 
may be significant. Two modes of germ cell specification 
are recognized among vertebrates - predetermined (germ 
plasm) and inductive (regulative). In the predetermined 
mode, maternally inherited germ plasm is partitioned du- 
ring cleavage to a subset of cells, which are then specified 
to become germ cells. In the inductive mode, there is no 
germ plasm and germ cells become specified by inducing 
signals from neighboring cells. The inductive mode is 
considered ancestral, with the predetermined mode inde- 
pendently derived in birds, frogs and teleost fishes [29]. 
The predetermined mode was proposed to be correlated 
with a derived mode of mesoderm induction [30,31] as 
well as with a more POUSFl-like class V POU transcrip- 
tion factor [32], although this preceded knowledge of the 
paralogous relationship between P0U2 and POUSFl 
among vertebrates [S,6]. We thus hypothesized that in- 
ductive germ cell specification is specifically correlated 
with the presence of a POUSFl ortholog, irrespective of 
the presence of POU2 [S]. Our present data are still 
largely consistent with this hypothesis. Evidence suggests 
that turtles have inductive germ cell specification [33], 
while retaining POUSFl (and P0U2), To our knowledge, 
no data exist on the mode of germ cell specification in 
crocodilians, which would be expected to share a similar 
mode with birds. Evidence suggests that the sturgeon (a 
basal actinopterygian) lacks germ plasm and is thus likely 
to have the inductive mode of germ cell specification 
[31,34]. The sturgeon genome has not been sequenced, so 
it is possible that it has a POUSFl ortholog in addition to 
its previously reported P0U2 ortholog [13]. Indeed, 
Johnson et al [32] do refer to an unpublished "Oct-4" se- 
quence from sturgeon. In the sequenced genome of the 
spotted gar (a less basal, non-teleost actinopterygian), we 
found all five exons of a P0U2 ortholog but no exons cor- 
responding to a POUSFl ortholog. Thus P0U2 presumably 
became extinct in a common ancestor of gars and teleost 
fishes. To our knowledge, the mode of germ cell specifica- 
tion of gars has not been investigated. Early studies of 
elasmobranch fishes cited by Extavour and Akam [29] 
drew conflicting conclusions regarding the mode of germ 
cell specification in elasmobranch fishes and no studies 
have examined fishes of the subclass Holocephali (for ex- 
ample, elephantfish). Further studies examining the mode 
of germ cell specification in several of the above lineages 
will provide powerful data to test the intriguing notion that 
the acquisition of predetermined germ cell specification 
permits or even drives the loss of POUSFl [32]. 
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Conclusions 

Our study resolves the controversial evolutionary rela- 
tionship between teleost pou2 and tetrapod P0U2 and 
POUSFl, It shows that class V POU transcription factors 
have existed at least since the common ancestor of 
gnathostome vertebrates and provides a framework for 
elucidating the basis for the lineage-specific extinctions 
of P0U2 and POUSFl, which is likely to be informative 
for understanding their roles in development. 

Methods 

General sequence analysis was performed using MacVector 
software, version 12.73 (MacVector, Inc.; Gary, North 
Garolina, USA). Sources of all sequences are detailed in 
Additional file 4. Sequences were selected to provide a 
broad range of taxonomic groups. The three Xenopus 
POU2 orthologs (OGT91, OGT60 and OGT25) were not 
included in analyses since they display considerable se- 
quence divergence, which could be related to redundancy 
among them. All alignments were performed using the al- 
gorithm Muscle (with default parameters) in MacVector 
on translated sequences. Subsequent manual adjustment 
was only performed for the full POU2/POU5F1 align- 
ment. Phylogenetic analyses on aligned sequences were 
performed using PHYLIP version 3.69 [35], using the 
maximum parsimony (100 replicates, 10 jumbles), max- 
imum likelihood without the assumption of a molecular 
clock (1,000 replicates, 10 jumbles) and neighbor-joining 
(1,000 replicates) methods with default parameters. 

Additional files 
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